Annals of Internal Medicine
Top medRxiv preprints most likely to be published in this journal, ranked by match strength.
Show abstract
BackgroundLong COVID affects millions worldwide, yet the long-term trajectory of healthcare costs remains poorly characterized. Prior studies with limited follow-up have documented elevated but stable excess costs, leaving uncertainty about whether the economic burden attenuates or persists over time. MethodsWe conducted a retrospective cohort study using electronic health record data from 12 hospitals and 20 community health centers (January 2018 through December 2024). Adults with documented ...
Show abstract
COVID-19 has been shown to cause a range of harmful long-term effects on nearly every organ system1-3. These findings are based on retrospective studies comparing COVID-19 patients to patients with similar medical histories and demographics but no COVID-19 diagnosis4-16. However, concerns have emerged that these comparisons may be biased if COVID-19 patients had unrelated health conditions or other factors not recorded in their medical records17-21. Here, using a massive dataset of 14.4 billion ...
Show abstract
BackgroundDelivering timely, high-quality feedback on resident scholarly projects is labour-intensive, especially in large programmes. We developed an AI-assisted evaluation system, powered by the open-weight LLaMA-3.1 large-language model (LLM), to generate formative feedback on Family Medicine residents scholarly projects and compared its performance with expert human evaluators. MethodsWe evaluated whether the AI-generated feedback achieves comparable quality to expert feedback. The tool ing...
Show abstract
AbstractAccurate health information is ineffective if patients cannot understand it. Large Language Model (LLM) health research values veridical precision; however, linguistic accessibility remains an under-examined component of output quality and usability. This study investigated two sources of variability in readability classification: differences across LLM systems and across readability metrics. The analysis tested 1,120 data points from seven systems in English and Portuguese, comparing ba...
Show abstract
This exploratory analysis of PAX LC, a Phase 2, 1:1 randomized, double-blind, superiority, placebo-controlled trial examined whether treatment with nirmatrelvir/ritonavir (NMV/r) versus placebo/ritonavir (PBO/r) in individuals with Long COVID could reveal immune features associated with symptom improvement. Eighty-two participants (n=45 PBO/r; n=37 NMV/r) provided blood samples at baseline (Day 0) and post-treatment (Day 28). Baseline demographic and immunological phenotypes were similar in the ...
Show abstract
BackgroundRetrieval-augmented generation (RAG) frameworks such as RAPID [1] have demonstrated that staged planning and retrieval grounding improve long-form text generation. However, most implementations remain similarity-driven and open-domain, lacking the epistemic safeguards required for biomedical synthesis, where mechanistic completeness, temporal governance, traceability, and explicit gap classification are essential. ObjectiveTo develop and evaluate a topology-aware, graph-augmented retr...
Show abstract
RationaleObstructive sleep apnea (OSA) is linked to cardiovascular, metabolic, and cognitive morbidity. Although COVID-19 has been associated with long-term respiratory and neurological sequelae, its role in precipitating new-onset OSA remains unclear. ObjectivesTo evaluate whether SARS-CoV-2 infection increases risk of developing OSA up to 4.5 years post-infection and how risk varies by hospitalization status, demographics, comorbidities, and vaccination status. MethodsThis retrospective coho...
Show abstract
PuhrposeTo evaluate the short- and long-term cross-sectional associations between COVID-19 infection and multidimensional sleep health. MethodsData from the COVID-19 Outbreak Public Evaluation (COPE) initiative were used to examine the association between a novel multidimensional sleep health measure (COPE Multidimensional Sleep Health Scale, CMSHS) modeled from the RuSATED instrument and (1) COVID-19 infection and (2) post-acute sequelae of SARS-CoV-2 infection (PASC). ResultsData from 11,326...
Show abstract
BackgroundPreoperative biliary stenting alters biliary colonization and may reduce the effectiveness of perioperative antibiotic prophylaxis in pancreatoduodenectomy. Although broader-spectrum regimens have been associated with improved infectious outcomes, their microbiological adequacy in routine clinical practice remains poorly defined. We therefore evaluated the real-world adequacy of a prolonged ampicillin-sulbactam protocol, its association with infectious outcomes and survival, and the po...
Show abstract
BackgroundReliable identification of early predictors of adverse outcomes was essential during the pre-vaccination phase of the COVID-19 pandemic. Few studies have comprehensively integrated clinical presentation, laboratory parameters including arterial blood gas analysis, and chest computed tomography (CT) findings within a single well-characterized cohort, particularly in underrepresented regions of Brazil. MethodsThis retrospective cohort study included 482 consecutive adults (median age 61...
Show abstract
BackgroundSystematic reviews (SRs) are essential for evidence-based medicine but require extensive time and resources for abstract screening. Large language models (LLMs) offer potential for automating this process, yet concerns about data privacy, intellectual property protection, and reproducibility limit the use of cloud-based solutions in research settings. ObjectiveTo evaluate the performance of a locally deployed 20-billion parameter LLM for automated abstract screening in systematic revi...
Show abstract
Large language models (LLMs) are increasingly used by the public to seek health information, yet their reliability in addressing common vaccine myths remains unclear. We conducted an exploratory multi-vendor evaluation of three LLMs (GPT-5, Gemini 2.5 Flash, Claude Sonnet 4) using officially curated vaccination myths from Germanys public health institution and two realistic user framings as prompts: a curious skeptic and a convinced believer. All model responses were independently evaluated by t...
Show abstract
BackgroundThe 2024 blood culture bottle shortage brought diagnostic resource allocation to the forefront, reflecting persistent, foundational challenges with low-value testing and empiric treatment approaches under clinical uncertainty. ObjectiveTo determine whether a machine learning approach using electronic medical record data can predict bacteremia more effectively than existing systems and practices to guide diagnostic testing and empiric treatment strategies. MethodsIn a retrospective co...
Show abstract
BackgroundLarge language models (LLMs) are increasingly piloted as chat interfaces for chart review and clinical decision support. Although leading models achieve and even exceed physician-level accuracy on exam-style benchmarks such as MedQA, recent perturbation studies show large drops in accuracy after small changes to prompts, distractor content, or answer format. Prior work has not systematically examined how these vulnerabilities unintentionally manifest in clinically realistic settings, i...
Show abstract
BackgroundThe effects of Banff histological diagnoses on kidney transplant outcome have been well characterized. However, repeated observation of such histological injury across multiple biopsies in kidney transplant recipients remains insufficiently explored. MethodsIn an observational cohort (N=1819 transplantations with 5736 post-transplant biopsies, recurrent event survival models quantified transitions between diagnoses of T-cell mediated rejection (TCMR), antibody-mediated rejection (AMR)...
Show abstract
ObjectiveTo re-estimate and re-validate the Area Deprivation Index to address recent criticism of the existing index, which is calculated and distributed by Neighborhood Atlas. Data SourcesTo calculate the updated Area Deprivation Index (ADI), we obtained 17 census measures from the 2018-2022 American Community Survey (ACS) 5-year data that reflected poverty, housing, employment, and education within census block groups, census tracts, and counties. To validate the association of the updated in...
Show abstract
PurposeLarge language models (LLMs) are used for biomedical text processing, but individual decisions are often hard to audit. We evaluated whether enforcing a mechanically checkable "show your work" quote affects accuracy, stability, and verifiability for trial eligibility-scope classification from abstracts. MethodsWe used 200 oncology randomized controlled trials (2005 - 2023) and provided models with only the title and abstract. Trials were labeled with whether they allowed for the inclusio...
Show abstract
Large language models (LLMs) are increasingly transforming scientific workflows, yet their application to rigorous evidence synthesis remains underexplored. Through the execution of a single Python script, we present a fully automated pipeline leveraging the Claude API to generate systematic reviews from literature search through manuscript completion without human intervention. Our pipeline processes hundreds of papers through iterative API calls for inclusion evaluation, information extraction...
Show abstract
ObjectiveThis study explored the recovery experiences of individuals who report having (largely) recovered from long covid and who attributed their improvement to mind-body approaches. Design, setting and participantsWe conducted an explorative qualitative study using purposive recruitment through social media and snowball sampling. Eighteen adult women (aged 37-62 years), who self-identified as having had long covid and having substantially recovered through mind-body approaches participated i...
Show abstract
BackgroundImproving tuberculosis (TB) treatment success is critical for improving the health of individuals with TB, reducing transmission, and lowering treatment costs. We conducted a four-arm randomized controlled trial (RCT) to evaluate whether three digital interventions with increasing support improved treatment outcomes compared to the standard of care. MethodsIn this open-label, parallel RCT in Kenya, all TB patients at 902 participating clinics who had at least 2 months of treatment rem...